65 research outputs found
Abelian-Square-Rich Words
An abelian square is the concatenation of two words that are anagrams of one
another. A word of length can contain at most distinct
factors, and there exist words of length containing distinct
abelian-square factors, that is, distinct factors that are abelian squares.
This motivates us to study infinite words such that the number of distinct
abelian-square factors of length grows quadratically with . More
precisely, we say that an infinite word is {\it abelian-square-rich} if,
for every , every factor of of length contains, on average, a number
of distinct abelian-square factors that is quadratic in ; and {\it uniformly
abelian-square-rich} if every factor of contains a number of distinct
abelian-square factors that is proportional to the square of its length. Of
course, if a word is uniformly abelian-square-rich, then it is
abelian-square-rich, but we show that the converse is not true in general. We
prove that the Thue-Morse word is uniformly abelian-square-rich and that the
function counting the number of distinct abelian-square factors of length
of the Thue-Morse word is -regular. As for Sturmian words, we prove that a
Sturmian word of angle is uniformly abelian-square-rich
if and only if the irrational has bounded partial quotients, that is,
if and only if has bounded exponent.Comment: To appear in Theoretical Computer Science. Corrected a flaw in the
proof of Proposition
The Rightmost Equal-Cost Position Problem
LZ77-based compression schemes compress the input text by replacing factors
in the text with an encoded reference to a previous occurrence formed by the
couple (length, offset). For a given factor, the smallest is the offset, the
smallest is the resulting compression ratio. This is optimally achieved by
using the rightmost occurrence of a factor in the previous text. Given a cost
function, for instance the minimum number of bits used to represent an integer,
we define the Rightmost Equal-Cost Position (REP) problem as the problem of
finding one of the occurrences of a factor which cost is equal to the cost of
the rightmost one. We present the Multi-Layer Suffix Tree data structure that,
for a text of length n, at any time i, it provides REP(LPF) in constant time,
where LPF is the longest previous factor, i.e. the greedy phrase, a reference
to the list of REP({set of prefixes of LPF}) in constant time and REP(p) in
time O(|p| log log n) for any given pattern p
A Multidimensional Critical Factorization Theorem
The Critical Factorization Theorem is one of the principal results in combinatorics on words. It relates local periodicities of a word to its global periodicity. In this paper we give a multidimensional extension of it. More precisely, we give a new proof of the Critical Factorization Theorem, but in a weak form, where the weakness is due to the fact that we loose the tightness of the local repetition order. In exchange, we gain the possibility of extending our proof to the multidimensional case. Indeed, this new proof makes use of the Theorem of Fine and Wilf, that has several classical generalizations to the multidimensional cas
On the number of factors of Sturmian words
AbstractWe prove that for m⩾1, card(Am) = 1+∑mi=1 (m−i+1)ϕ(i) where Am is the set of factors of length m of all the Sturmian words and ϕ is the Euler function. This result was conjectured by Dulucq and Gouyou-Beauchamps (1987) who proved that this result implies that the language (∪m⩾0Am)c is inherently ambiguous. We also give a combinatorial version of the Riemann hypothesis
Minimal forbidden words and factor automata
International audienceLet L(M) be the (factorial) language avoiding a given antifactorial language M. We design an automaton accepting L(M) and built from the language M. The construction is eff ective if M is finite. If M is the set of minimal forbidden words of a single word v, the automaton turns out to be the factor automaton of v (the minimal automaton accepting the set of factors of v). We also give an algorithm that builds the trie of M from the factor automaton of a single word. It yields a non-trivial upper bound on the number of minimal forbidden words of a word
Text Compression Using Antidictionaries
International audienceWe give a new text compression scheme based on Forbidden Words ("antidictionary"). We prove that our algorithms attain the entropy for balanced binary sources. They run in linear time. Moreover, one of the main advantages of this approach is that it produces very fast decompressors. A second advantage is a synchronization property that is helpful to search compressed data and allows parallel compression. Our algorithms can also be presented as "compilers" that create compressors dedicated to any previously fixed source. The techniques used in this paper are from Information Theory and Finite Automata
Entropy and Compression: A simple proof of an inequality of Khinchin-Ornstein-Shields
This paper concerns the folklore statement that ``entropy is a lower bound
for compression''. More precisely we derive from the entropy theorem a simple
proof of a pointwise inequality firstly stated by Ornstein and Shields and
which is the almost-sure version of an average inequality firstly stated by
Khinchin in 1953. We further give an elementary proof of original Khinchin
inequality that can be used as an exercise for Information Theory students and
we conclude by giving historical and technical notes of such inequality.Comment: Compared to version 1, in version 2 we added a simpler proof than the
one given by Shields of a more general theorem (Theorem 4, pg. 7) presented
by Ornstein and Shields. Consequently we also modified the title of the
paper. In version 3 we have reordered the sections of the paper, simplified
the proof of Theorem 4 (now Theorem 3) and significantly reduced the proof of
Theorem 3 (now Theorem 4
Using Inductive Logic Programming to globally approximate Neural Networks for preference learning: challenges and preliminary results
In this paper we explore the use of Answer Set Programming (ASP), and in particular the state-of-the-art Inductive Logic Programming (ILP) system ILASP, as a method to explain black-box models, e.g. Neural Networks (NN), when they are used to learn user preferences. To this aim, we created a dataset of users preferences over a set of recipes, trained a set of NNs on these data, and performed preliminary experiments that investigate how ILASP can globally approximate these NNs. Since computational time required for training ILASP on high dimensional feature spaces is very high, we focused on the problem of making global approximation more scalable. In particular we experimented with the use of Principal Component Analysis (PCA) to reduce the dimensionality of the dataset while trying to keep our explanations transparent
- …